Structural Sentence Similarity Estimation for Short Texts

نویسندگان

  • Weicheng Ma
  • Torsten Suel
چکیده

Sentence similarity is the basis of most text-related tasks. In this paper, we define a new task of sentence similarity estimation specifically for short while informal, socialnetwork styled sentences. The new type of sentence similarity, which we call Structural Similarity, eliminates syntactic or grammatical features such as dependency paths and Partof-Speech (POS) tagging which do not have enough representativeness on short sentences. Structural Similarity does not consider actual meanings of the sentences either but puts more emphasis on the similarities of sentence structures, so as to discover purposeor emotion-level similarities. The idea is based on the observation that people tend to use sentences with similar structures to express similar feelings. Besides the definition, we present a new feature set and a mechanism to calculate the scores, and, for the needs of disambiguating word senses we propose a variant of the Word2Vec model to represent words. We prove the correctness and advancement of our sentence similarity measurement by experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentence Similarity based on Relevance

The issue about sentence similarity is essential to many areas of artificial intelligence. Although there are related studies on determining text similarity, fewer publications are about the similarity between short texts especially about the similarity between sentence pairs. In this paper, a novel method is proposed to estimate the sentence similarity with consideration of direct relevance an...

متن کامل

Measuring the sentence level similarity

This article describes a method used to calculate the similarity between short English texts, specifically of sentence length. The described algorithm calculates semantic and word order similarities of two sentences. In order to do so, it uses a structured lexical knowledge base and statistical information from a corpus. The described method works well in determining sentence similarity for mos...

متن کامل

A Method for Measuring Sentence Similarity and iIts Application to Conversational Agents

This paper presents a novel algorithm for computing similarity between very short texts of sentence length. It will introduce a method that takes account of not only semantic information but also word order information implied in the sentences. Firstly, semantic similarity between two sentences is derived from information from a structured lexical database and from corpus statistics. Secondly, ...

متن کامل

Short Text Similarity Measure Based on Double Vector Space Model

Short text similarity measure is the basis of classification and duplicate checking of the short texts. Allowing for the insufficient consideration of the sentence semantic and structure information in similarity calculation between two short texts, we propose a novel method of short text similarity calculation based on double vector space model on the basis of traditional vector space model. C...

متن کامل

Semantic Similarity of Short Texts

This paper presents a method for measuring the semantic similarity of texts using a corpus based measure of semantic word similarity and a normalized and modified versions of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for computing text similarity have focused mainly on either large documents or individual words. In this paper, we focus on computing the sim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016